For ONVIF TTS audio proposal, to support device with TTS function#694
For ONVIF TTS audio proposal, to support device with TTS function#694Peggy0422 wants to merge 21 commits intodevelopmentfrom
Conversation
1. Added AddTTSAudioClip request and AddTTSAudioClip response for sending a text and its TTS configuration to the device(1621-1652)(2036-2041)(2418-2422)(2935-2943). 2. Added complex types "TTS Audio" (1465-1485)for TTSConfiguration to support TTS function. It includes parameters Content, Language, VoiceType. 3. updated AudioClipCapabilities with TTSCapabilities(177-181), and added complex types for TTSCapabilities(201-220)to indicate the device supports TTS function and its corresponding configuration. complex types TTSCapabilities includes MaxContentLength, TTSLanguage and TTSVoiceType. 4. Added simpleType TTSLanguage(220-231) and TTSVoiceType(232-238).
1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.(2359-2416) 2. updated audio clip Capabilities with TTSCapabilities.(2698-2700)
update code line information for TTS function
correct some editorial errors
Updated the description of the AddTTSAudioClip operation to clarify the parameters and response. Updated the description of TTScapabilities.
TTS audio clip pull request was firstly created as number 668
Updated TTS configuration description and added TTSCapabilities entry.
|
OLD PR for reference |
doc/Media2.xml
Outdated
| </varlistentry> | ||
| </variablelist> | ||
| <para></para> | ||
| <para><emphasis role="bold">Note:</emphasis> Audio clip uploads to the device can fail in the following scenarios, and a specific HTTP error code should be returned to the client when an upload fails.</para> |
There was a problem hiding this comment.
this note seems not applicable for TTSAudioClip
There was a problem hiding this comment.
Yes, it is not for TTS, I will delete it.
delete inappropriate note for OPTION AddTTSAudioClip
johado
left a comment
There was a problem hiding this comment.
Some small textual comments.
doc/Media2.xml
Outdated
| <title>AddTTSAudioClip</title> | ||
| <para>This operation adds a text, audio clip configuration and TTS configuration to the device, for device converting the text to an audio clip based on the TTS configuration. | ||
| The response to the command includes a unique token for this converted audio clip. | ||
| If the device is unable to support language specified in the TTS configuration, the associated configuration will deleted from the device.</para> |
There was a problem hiding this comment.
add "be" to "will be deleted"
doc/Media2.xml
Outdated
| <term>response</term> | ||
| <listitem> | ||
| <para role="param">Token - [tt:ReferenceToken]</para> | ||
| <para role="text">Unique token of the TTS audio clip to be uploaded.</para> |
There was a problem hiding this comment.
Change "to be uploaded" to "that was added" ?
There was a problem hiding this comment.
Thank you very much for your advise, we consider using the word "assign", which should be more precise.
doc/Media2.xml
Outdated
| </varlistentry> | ||
| <varlistentry> | ||
| <term>TTSCapabilities</term> | ||
| <listitem><para>Indicates device supports TTS function and TTS configuration.See tr2: TTSCapabilities.</para></listitem> |
There was a problem hiding this comment.
Add space after .: "..configuration. See tr2:..."
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| </xs:element> | ||
| <xs:element name="Language" type="xs:string"> | ||
| <xs:annotation> | ||
| <xs:documentation>Language for the TTS audio clip playback. See tr2: TTSLanguage. </xs:documentation> |
There was a problem hiding this comment.
Change to "See tr2:TTSLanguage and TTSCapabilities." ?
There was a problem hiding this comment.
Thank you for your option. TTSLanguage is an attribute within TTSCapability already. If we want to point out that the language for TTS audio clip playback must be one of the languages that supported by the device, we could consider revise the explanation to clearly indicate this, such as: "The language which is supported and used for TTS audio clip playback. "
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| </xs:element> | ||
| <xs:element name="VoiceType" type="xs:string"> | ||
| <xs:annotation> | ||
| <xs:documentation>The voice type for the TTS audio clip playback. See tr2: TTSVoiceType.</xs:documentation> |
There was a problem hiding this comment.
Change to "See tr2:TTSVoiceType and TTSCapabilities." ?
There was a problem hiding this comment.
I propose to update the explanation for TTSVoiceType, just like commit for TTSLanguage
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| <xs:sequence> | ||
| <xs:element name="Token" type="tt:ReferenceToken"> | ||
| <xs:annotation> | ||
| <xs:documentation>Unique token of the TTS audio clip to be uploaded.</xs:documentation> |
There was a problem hiding this comment.
change "to be uploaded" to something more relevant. converted, generated, ..?
There was a problem hiding this comment.
Thank you very much for bring it up, yes, we consider changing it and using the word "assign", which should be more precise.
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| <xs:anyAttribute processContents="lax"/> | ||
| </xs:complexType> | ||
| <!--===============TTS Language================--> | ||
| <xs:simpleType name="TTSLanguage"> |
There was a problem hiding this comment.
What is reasoning behind decision of languages in below list?
There was a problem hiding this comment.
Is there any standard for offical language names that can be refered to?
TTSCapabilities and TTSAudio uses open strings, so enum should provide a good pattern.
There was a problem hiding this comment.
There was a problem hiding this comment.
Thank you so much for your comments! We truly appreciate your input and have been carefully considering how to best define these general concepts. Your mention of ISO international standards was particularly helpful and guided our further research. We also looked into RFC 5646 for language representation across countries. So we would like to use alpha-2 codes to represent languages and countries, as recommended in ISO 639-1 and ISO 3166-1. For languages with regional variations, we plan to adopt the language-country format (e.g., en-US, zh-CN). Thank you again for your feedback.
doc/Media2.xml
Outdated
| </itemizedlist> | ||
| </section> | ||
| </section> | ||
| <section xml:id="section_wvd_dzg_rye"> |
There was a problem hiding this comment.
id should be unique in xml, right? seems as it is a copy of SetAudioClip section below
There was a problem hiding this comment.
Yes, thank you for the suggestion. I have revised it accordingly.
update description for TTSLanguage and TTSVoiceType
Update documentation for Token element for AddTTSAudioClip response
Updated TTSLanguage type to include ISO language and country codes with documentation.
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| See <a href="https://www.iso.org/obp/ui/">ISO Country Codes</a>. | ||
| </xs:documentation> | ||
| </xs:annotation> | ||
| <xs:restriction base="xs:string"> |
There was a problem hiding this comment.
Do we really need to make an explicit restriction here and not just defined it as a string? If we go this way, whenever we need to add a language we need to update the WSDL file.
There was a problem hiding this comment.
Thank you very much for your comment! Yes, this is an important issue we should considered.
Previously, we defined languages using string format and listed commonly used or potentially needed languages. However, this approach does introduce a maintenance burden—as you pointed out, each new language addition would require updating the WSDL file.To address this, we now directly reference ISO-standard language codes via strings. Users may refer to the official ISO codes for specific needs, while the WSDL only defines the reference rules. The examples in TTSLanguage are provided for convenience. I hope this clarifies the approach. Thank you again for your comment!
Added note about enumeration values being illustrative in TTSLanguage.
Revise the description of language definition in TTScapability and TTSAudio
kieran242
left a comment
There was a problem hiding this comment.
A couple of requested changes
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| <xs:annotation> | ||
| <xs:documentation> | ||
| List of supported languages. Uses ISO 639-1 alpha-2 language codes, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>. | ||
| Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>. |
There was a problem hiding this comment.
The link supplied "https://www.iso.org/obp/ui/" to reference ISO 3166-1 does not direct you to the standard instead it takes you to the following page: Can we fix this reference please :)
There was a problem hiding this comment.
Sure, I'll replace the link with the direct reference immediately (https://www.iso.org/obp/ui/#search/code/). Thank you for pointing this out.
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| <xs:documentation> | ||
| The language that is supported by the device and used for TTS audio clip playback. | ||
| Uses ISO 639-1 alpha-2 language codes for definition, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>. | ||
| Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>. |
There was a problem hiding this comment.
As per my previous 2 comments above. Please correct here also.
There was a problem hiding this comment.
Understood, I have already corrected this section as well. Thank you for the reminder.
wsdl/ver20/media/wsdl/media.wsdl
Outdated
| <xs:annotation> | ||
| <xs:documentation> | ||
| List of supported languages. Uses ISO 639-1 alpha-2 language codes, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>. | ||
| Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>. |
There was a problem hiding this comment.
| Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>. | |
| Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166-1 Country Codes</a>. |
There was a problem hiding this comment.
Yes, that makes it more accurate. I've updated the relevant section. Thank you for your advice! :)
|
@Peggy0422 When you use the "AddTTSAudioClip" api is the "TTSConfiguration" stored on the device with the Audio Clip or just used to create the audio clip dynamically? If stored on the device then there is no way to update it or identify it. Further when you request "GetAudioClips" there does not seem to be a way to identify which is an uploaded Audio clip and a TTS Audio Clip other than the Audio Clip token returned to the user from the API. This would make updating or deleting an Audio TTS Clip difficult without keeping a track of the tokens and your "TTSConfiguration" in some way. |
|
@kieran242 Thank you very much for your questions. Regarding the "AddTTSAudioClip" API, the "TTSConfiguration" is used solely for generating the audio clip and is not stored on the device. Typically, adding a TTS audio clip is the first step to enable playback on the device. When a client uses AddTTSAudioClip, the device returns a token via the AddTTSAudioClipResponse that corresponds to the generated TTS audio clip. This token serves as a unique identifier for subsequent operations, such as Get, Set or Delete. |
update the reference link for country code
To support audio product with TTS function, several operation should be done:
Added TTSCapabilities(Optional): indicate whether the device is capable of TTS function and its corresponding TTS configuration. So add complex type "TTSCapabilities" to the existing complex type "AudioClipCapabilities".
Parameter:
Parameter:
Reponse:
media2.wsdl
media2.xml and documentation